-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
application: extract path prefix logic to middleware #2733
Conversation
Summary: Currently, we apply the `_handle_errors` middleware as a decorator directly on `__call__`. We plan to add middleware that is parameterized on instance variables, so this pattern will no longer suffice. This commit simply refactors the existing code to apply middleware at initialization time. Test Plan: It suffices to run `bazel test //tensorboard/backend:application_test`. wchargin-branch: app-middleware wchargin-source: 77c4f793de8b64d88bd10212db5a0191995b29ca
Summary: This way, we can reason about and test it in isolation as we add more middleware to the application. Note that when using `--path_prefix`, loading the main TensorBoard page without a trailing slash on the URL has long been broken (#1117). This commit does not fix that, but changes the exact failure mode: rather than 404ing, we now load a broken TensorBoard (relative URLs don’t resolve). A follow-up commit will fix this properly. Test Plan: Verify that TensorBoard still works fully, with `--path_prefix` and without it. wchargin-branch: path-prefix-middleware wchargin-source: 3e07d2b681f12c4bab310bef2afc066590ae5cb7
Summary: Specifying a path prefix without leading slash has always been disallowed by the docs, but in TensorBoard 2.0.0 this just hits an inscrutable internal error when starting the server: ``` ValueError: invalid literal for int() with base 10: '6006my_path_prefix' ``` As of #2733, this is at least a correctly reported error, but it still has a stack trace. This commit makes it a normal, fast-failing usage error. The logic to strip trailing slashes was also duplicated, so `/foo` and `/foo/` and `/foo//` were all the same, but differed from `/foo///`. This commit removes one occurrence and makes the remaining one remove all trailing slashes instead of just one. Test Plan: Run with `--path_prefix /foo`, and note that TensorBoard exits with a single-line error. Run with `/foo`, `/foo/`, `/foo//`, and `/foo///`, and note that all four now have the same behavior. wchargin-branch: path-prefix-validation wchargin-source: 514c3bc317a1b90561f29f378919424b4284cc66
|
||
with self.subTest("at root"): | ||
response = server.get("/pfx") | ||
self._assert_ok(response, path="", script="/pfx") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no action required; I simply dunno: how does our server respond when the path is "" vs. "/"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. The answer is that Werkzeug normalizes that out, so our
app (which uses Werkzeug for routing) only sees request.path == "/"
:
# main.py
import werkzeug
def bare_app(environ, start_response):
start_response("200 OK", [("Content-Type", "text/plain")])
return [repr(environ.get("PATH_INFO")).encode("utf-8")]
@werkzeug.Request.application
def werkzeug_app(request):
return werkzeug.Response(repr(request.path), content_type="text/plain")
start_response = lambda status, headers: None
request_empty = {"REQUEST_METHOD": "GET", "PATH_INFO": ""}
request_root = {"REQUEST_METHOD": "GET", "PATH_INFO": "/"}
print("With base WSGI:")
print(list(bare_app(request_empty, start_response)))
print(list(bare_app(request_root, start_response)))
print("With Werkzeug:")
print(list(werkzeug_app(request_empty, start_response)))
print(list(werkzeug_app(request_root, start_response)))
$ python main.py
With base WSGI:
[b"''"]
[b"'/'"]
With Werkzeug:
[b"'/'"]
[b"'/'"]
wchargin-branch: path-prefix-middleware wchargin-source: e74bb6f8dd0aef732604991f63d1803b0ce2a206 # Conflicts: # tensorboard/backend/application.py
wchargin-branch: path-prefix-middleware wchargin-source: e74bb6f8dd0aef732604991f63d1803b0ce2a206
from tensorboard import errors | ||
|
||
|
||
class PathPrefixMiddleware(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I had been thinking we could generalize this as taking a dict mapping prefixes to WSGI apps. This unblocks using this for the plugin routing as well as described in #2573.
Feel free to leave this PR as-is, but just FYI, I think it will make sense to fold this into a more generic middleware sooner or later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me.
path_prefix: A string path prefix to be stripped from incoming | ||
requests. If empty, this middleware is a no-op. If non-empty, | ||
the path prefix must start with a slash and not end with one | ||
(e.g., "/tensorboard"). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO the semantics are more obvious if we require that the path prefix end in a slash, since then we don't necessarily even need to support the redirect, and there's no confusion about what the canonical URL will be.
WDYT? I think we could fairly consider that not to be a breaking change given that using the prefix without the trailing slash is broken today due to the lack of redirect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m fine with making that change at the user level (i.e., a requirement
on the flag value). As an implementation detail, I find it useful to
represent path components like this with a leading slash and no trailing
slash, because then they concatenate properly. This is also what
Werkzeug does for the .script_root
property (they don’t expose
SCRIPT_NAME
correctly), and is the convention that application.py
uses for constants like DATA_PREFIX = "/data"
.
Given that this PR doesn’t actually make any user-facing changes modulo
existing undefined behavior, I’m comfortable deferring the tightening of
the flag domain. Let me know if you object.
Summary: Specifying a path prefix without leading slash has always been disallowed by the docs, but in TensorBoard 2.0.0 this just hits an inscrutable internal error when starting the server: ``` ValueError: invalid literal for int() with base 10: '6006my_path_prefix' ``` As of #2733, this is at least a correctly reported error, but it still has a stack trace. This commit makes it a normal, fast-failing usage error. The logic to strip trailing slashes was also duplicated, so `/foo` and `/foo/` and `/foo//` were all the same but differed from `/foo///`. This commit removes one copy of the logic and makes the remaining copy strip all trailing slashes instead of just one. Test Plan: Run with `--path_prefix /foo`, and note that TensorBoard exits with a single-line error. Run with `/foo`, `/foo/`, `/foo//`, and `/foo///`, and note that all four now have the same behavior. wchargin-branch: path-prefix-validation
Summary:
This way, we can reason about and test it in isolation as we add more
middleware to the application.
Note that when using
--path_prefix
, loading the main TensorBoard pagewithout a trailing slash on the URL has long been broken (#1117). This
commit does not fix that, but changes the exact failure mode: rather
than 404ing, we now load a broken TensorBoard (relative URLs don’t
resolve). A follow-up commit will fix this properly.
Test Plan:
Verify that TensorBoard still works fully, with
--path_prefix
andwithout it.
wchargin-branch: path-prefix-middleware